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Abstract 



We state a quantum version of Bayes's rule for statistical inference and give a 
simple general derivation within the framework of generalized measurements. 
The rule can be applied to measurements on N copies of a system if the 
initial state of the N copies is exchangeable. As an illustration, we apply 
the rule to N qubits. Finally, we show that quantum state estimates derived 
via the principle of maximum entropy are fundamentally different from those 
obtained via the quantum Bayes rule. 



1 



Typeset using REVTgX 



During the last decade, interest in Bayesian methods of statistical inference has increased 
considerably At the heart of the Bayesian approach is Bayes's rule, which indicates 

how to update a state of knowledge in the light of new data. The simplest form of the rule 
is 

where p(D\H) is the probability for the data D given a hypothesis H, p(H) is the prior 
probability that the hypothesis is true, p(H\D) is the posterior probability that the hy- 
pothesis is true given the data, and p(D) = J2h p(D\H)p(H) is the probability for the data 
averaged over all hypotheses. The conceptual simplicity of Bayes's rule is a major strength 
of the Bayesian approach. 

The problems of statistical inference and state estimation are of central importance in 
quantum information theory. After the early pioneering work on quantum inference [0-|6| 
and quantum state tomography |/H5|, a large amount of work has been done on the subject 
(see, e.g., Refs. [jT0| - |19fl ). In many of the cited papers, a quantum version of Bayes's rule 
is used either implicitly or explicitly. Jones has derived a quantum Bayes rule for pure 
states only. In this paper, we derive a general rule, valid both for pure and mixed states, 
and give a precise condition for its validity. 

We consider the following general inference problem. Let Ti, be the Hilbert space of a 
quantum system. The Hilbert space of iV copies of the system is given by the iV-fold tensor 
product, H® N . Suppose one is given a (prior) state p( M+7V ) on 7-[®( M + N ) an d the results of 
measurements on M subsystems. The task is to find the (posterior) state of the remaining iV 
subsystems conditioned on the measurement results. The problem is in principle completely 
solved by the theory of generalized measurements [20], which prescribes the state of the 



total system after the measurement. There is no room in quantum theory for an additional 
independent inference principle; any inference rule must be derivable from the basic theory. 

An arbitrary measurement on M subsystems is described by a set of completely positive, 
trace-decreasing operations, {Fk}, which act on the selected M subsystems. The measure- 
ment result is k with probability 

p k = tr[W {M+N) )\ ■ (2) 
Since the operations Tk are completely positive, they can be expressed in the form 

Mp {m+n) ) = ® i) p {m+n) ( A li ® i) > ( 3 ) 

I 

where the Aki are arbitrary operators acting on the selected M subsystems. The probabilities 
Pk can thus be rewritten as 

p k = tr[(E k <g> i)p( M +^)] = tr M (E k p iM) ) , (4) 

where 

E k = £ A{ t A kl (5) 
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is a positive semidefinite operator and 



£4 = 1; (6) 

k 

i.e., the set {Ek} forms a positive operator valued measure (POVM). In the last form of 
Eq. d), is the prior marginal density operator of the measured subsystems, and trM 
denotes a trace over the measured subsystems. 

If the measurement result is k, the (normalized) state of all M + iV systems after the 
measurement is 

p[ M+N) = -MP (M+N) ) , (7) 
Pk 

where Tk{p <yM+N ' > ) ) given in Eq. fl!|), is the unnormalized state conditioned on measurement 
outcome k. Performing a partial trace over the selected M subsystems yields the posterior 
state of the remaining N subsystems, 

A N, -tMA M+m )- (8) 

An exact quantum analogue of the classical Bayes rule would write this posterior state 
as a mixture in which the updating as a consequence of obtaining result k (the "data") 
would appear in the probabilities in the mixture, but not in the density operators that 
contribute to the mixture. Classically it is possible to obtain information about a system 
without disturbing it, while quantum mechanically it is not; hence, Eq. (g) must include 
both updating due to the information acquired and due to the disturbing effects of the 
measurement. In general, this only takes the form of the classical Bayes rule if the measured 
and unmeasured systems in Eqs. (Q)-(|8|) are initially unentangled. 
Notice also that for a product prior, 

p(M+N) = p®(m+n) = a, ® . . . ® a, ( (9) 

M + N terms 

where po is some state on H, the posterior state is = p® N , irrespective of the measure- 
ment result. No learning from data is possible for product priors. This shows in particular 
that the totally mixed state for M + N subsystems, which is both a product state and the 
state of maximum entropy on 7-[®( M + N ) ; does not allow learning from measured data. 
In many practical situations, one can restrict attention to prior states of the form 

p<"> = Jdp P (p)p m , (10) 

where dp is a measure on density operator space and p(p) is a normalized generating function, 
Jdpp(p) = 1. Prior states of the form ( |10D arise, e.g., if each subsystem is prepared in the 
same, unknown way, as in quantum state tomography. A state of N subsystems, p^ N \ can 
be expressed in the form (|l0|) if and only if it is exchangeable, i.e., if (i) it is invariant 
under permutations of the subsystems and (ii) for any M > 0, there is a state p^ N+M ^ of 
N + M subsystems that is invariant under permutations of the subsystems and that satisfies 
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p( N ) = tiM{p( N+M ^) [pl , p2 |. The expansion (|T0l) is then unique. This is the quantum version 



of the fundamental representation theorem due to de Finetti |2B[]; for an elementary proof 



of the quantum theorem see Ref. [24 



The significance of part (ii) of the definition of exchangeability given above is illustrated 
by the GHZ state p G HZ = I^ghz) (V>ghz|, where |^ghz) = (|000) + |lll))/\/2. This three- 
particle state is invariant under permutations of the three subsystems, but it is clear that 
Pghz cannot be obtained by a partial trace from a four-particle state that is invariant under 
permutations of all four particles. The GHZ state is thus not exchangeable, in accordance 
with the fact that it cannot be written in the form (flO|). 

If the condition of exchangeability is fulfilled, the question of finding a suitable prior 
state reduces to finding a suitable prior measure p(p)dp in the expansion (0). Much work 
has been done on suitable prior measures on density operator space (see, e.g., |T2| , p5| -p7|]). 



As in the classical theory of inference [Q], there exists no unique choice of prior measure; 
different kinds of prior information lead to different prior measures. 

The rule of inference, however, becomes extremely simple if the prior state is of the form 
(pp. In this case, we show below that if a measurement performed on the first subsystem 
yields result k, the posterior state of the remaining N — 1 subsystems is given by 

= Jdpp(p\k)p m - 1] , (11) 

where 

P (p\k) = . (12) 

Pk 

Here p(k\p) = tr(EkP) is the probability of obtaining the measurement result k for a single 
subsystem, given that the state of the single subsystem is p, and pk = Jdpp(k\p)p(p) is 
the average probability of obtaining k. This is the quantum Bayes rule; it is completely 
analogous to the classical rule (H). 

In the special case that the integration in Eq. (|H]) is restricted to pure states, the rule 
(|12|) has been derived by Jones || and applied to purifications of mixed states by Buzek 



et al. Tarrach and Vidal [|15[] have used Eq. ([EJ) to find optimal measurements on N 
copies of a system, identically prepared in an unknown mixed state by some preparation 
device. To our knowledge, Eq. flT2"|) has not been derived in the general context considered 
here. 

If measurements are performed on several subsystems individually, the rule flT2"|) can be 



simply iterated. Although the situation considered here, where measurements are done one 
subsystem at a time, is in practice the most important, it is straightforward to generalize 
the rule to the case of collective measurements on several subsystems. 

Strictly speaking, the generating function p(p) should not be called a probability — after 
all, a mixed state p is itself a summary of incomplete knowledge about a subsystem. Never- 
theless, the content of the quantum Bayes rule (|12"D is that the functions p(p) and p(p\k) can 
be used as if they were a prior probability and a conditional posterior probability for density 
operators. This interpretation is obviously appropriate in the case that the exchangeable 
state flTDD is known to have arisen from an experiment in which each subsystem is prepared 



in the same unknown state, with p(p) then being the probability that this unknown state 
is p. 
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To derive the rule fll2|), we denote by {Tk\ the set of completely positive, trace-decreasing 
operations which describe the measurement on the first subsystem. The result of the mea- 
surement is k with probability 

p k = tr[MP {N) )] = Jdpp(k\p)p(p) . (13) 
If the measurement result is k, the state of all N subsystems after the measurement is 

pf 3 = ^ ldpp{p)r h {p) ® P m ~ 1] , (14) 

Pk J 

where, by a slight abuse of notation, we denote by Tk{p) the unrenormalized state of a single 
subsystem with premeasurement state p conditioned on the measurement result k. A partial 
trace over the first subsystem gives the state of the remaining N — 1 subsystems, 

A N - 1] = ^Pk N) ) 

= - [dpp(p)tr[F k (p))p®( N -V 



Pk 
1 

Pk 



dpp(p)ti(E k p)p 



®(JV-1) 



d ~ P{p)p(k\p) ~®(N-p 
Pk 

dpp{p\k)P® {N - l) , (15) 

where in the last line we have substituted p{p\k) for the right-hand side of Eq. (JT3). This 
completes the derivation. 

We now illustrate the rule for a system of M + N qubits, for which the Hilbert space 
TC of each subsystem is two-dimensional. An arbitrary exchangeable state of M + N qubits 
can be written in the form 

P {M+N) =IU dxdydzp(x,y,z)p% M z ^ , (16) 

where p x , y ,z = |(1 + x &x + V&y + za z ) and the integrals range over the volume of the sphere 
of radius 1. Here a x , a y , a z are the Pauli operators, and 1 denotes the unit operator. 

Now assume that a z measurements are performed on M qubits. The probability of 
obtaining the result ±1, given state p x , y ,zi i n a measurement on a single qubit is 

K±l|p w ) = ^(l±z). (17) 

If the M measurements of a z yield M + results of +1 and M_ results of —1, where M + +M_ = 
M, then the state of the remaining iV qubits is 



p'mIm- =JJJ dxdydzp(x,y,z\M + ,M„)p®« z , (18) 
where 
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p{x, y, z\M+, M_) = Afp(x, y, z) 

t+ /i _ 

(19) 



i + z\ M + n-z- M - 



M being a normalization factor. 

In the limit M — * oo, assuming (M+ — M_)/M —>■ E z , we obtain 

?/, z\M + , M_) -> p(x, - E z ) , (20) 

where p(x,y\E z ) = p(x,y, E z )j JJ dx dyp(x,y, E z ) is the prior conditional probability for x 
and y, given that z = E z . Equation (p0|) expresses clearly the gain in information about z. 
For an isotropic prior, 

p(x,y,z) = p[yj 'x 2 + y 2 + , (21) 

the marginal state for a single subsystem before any measurements is the maximally mixed 
state p*- 1 ** = |l. After M measurements of cr 2 , in the limit M — ► oo, the marginal state for 
a single additional subsystem is 

pg] = i(i + (22) 



which is the state obtained in \Ti\. Our analysis puts this in a clear perspective: the data 



dictate the expectation value (a z ) = E z for the state (|22|); for an isotropic prior, the a z 
measurements tell one nothing about the direction of the spin in the x-y plane, so a x and 
b y retain the zero expectation values that apply to the prior marginal state of a single 
subsystem. 

It is important to note that the state does not allow one to make predictions 
about frequencies in future repeated measurements of, e.g., the observable a x . Although 
^ t (Pe]^x) — 0, it would be wrong to predict that the frequency of the outcome +1 in a large 
number of future a x measurements will be close to 1/2. The correct prediction for future 
a x measurements follows from the full state p( N > with the limiting posterior (|20|); for the 
probability of obtaining iV + results of +1 and AL results of —1 in iV measurements of a x , 
we get 

p(N + ,N4E z ) = J f dxdyp(x,y\E z ) (^r^f + l^)^ • ( 23 ) 

Only in the extreme case that the prior has the special form p(x,y,z) = p(y,z)8(x) does 
the probability (^3|) become identical to the prediction P(N + , iV_) = 2~ N N\/N + \N_\ that 
would follow from assigning the product state pg to the iV subsystems. It is clear that 
this prediction is not implied by the a z measurement data and is therefore unwarranted 
unless there is additional prior information. 

The marginal state of Eq. (|22|) can also be derived from the principle of maximum 
entropy (MAXENT) [^,^]. If all that is known about the state p of some system is the 
expectation value of one or several observables, the MAXENT state assignment results 
from maximizing the von Neumann entropy of p subject to the constraints given by the 
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expectation values (see Ref. [3(| for a derivation of the MAXENT principle in the quantum 
case) . 

In the example above, the MAXENT assignment following from the constraint (cr z ) = E z 
for a single subsystem is identical to the marginal state (p2]). This identity has also been 
noted by Buzek et al. ||14|| , who state that "... as soon as the number of measurements 
becomes large then [the] Bayesian inference scheme becomes equal to the reconstruction 
scheme based on the Jaynes principle of maximum entropy ... ." This statement is mis- 
leading, however, since the equality holds only for the marginal state of a single subsystem 
(and even then only under the isotropy assumption ( |2TD for the prior). Unlike the full 
state Pm1,M- m Eq. (0), found via Bayes's rule, the single-subsystem state derived 
via MAXENT does not allow one to make predictions for measurements on more than one 
subsystem. 

On the other hand, applying MAXENT directly to iV subsystems fails for the following 
reason, well known from classical probability theory pT] , ^2| . Maximizing the von Neumann 
entropy of subject to the constraint that (a z ) = E z for each subsystem yields the 
product state Pmaxent = Pe^ N ■ As discussed above, this state assignment is unwarranted 
because it leads to predictions for, say, future a x measurements which are in no way implied 
by the constraint on (cr z ). Furthermore, any product state assignment precludes learning 
from subsequent measurements, even though that should be possible, as was discussed in 
the paragraph after Eq. ([]). 

If the measurements on individual subsystems correspond to an informationally complete 
POVM PU| or if they contain sequences of measurements of a tomographically complete set 
of observables |14j], the posterior probability on density operators approaches a 5 function 



in the limit of many measurements. This is the case of quantum state tomography 
which can thus be viewed as a special case of quantum Bayesian inference. In this limit, 
the exact form of the prior probability on density operators becomes irrelevant. In all other 
situations, however, there will be some dependence on this prior. 

TAB thanks Bob Griffiths and Oliver Cohen for helpful discussions. TAB was supported 
in part by NSF Grant No. PHY-9900755, and CMC received partial support from ONR 
Grant No. N00014-93-1-0016. 
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